C++ has become a scripting language (canonical.com)
132 points by alexchamberlain on Oct 16, 2013 | hide | past | favorite | 193 comments

> Ignoring include boilerplate and the like leaves us with roughly ten lines of code.

I'm not sure you get to claim a language is a scripting language and then ignore the boilerplate.

The equivalent python is 4 lines, just one more than the number of steps you're performing.

  import sys
  data = open(sys.argv[1]).readlines()
  lines = sorted(data)
  open(sys.argv[2], 'w').writelines(lines)
C++ is good at many things, but quickly creating readable scripts and live-coding with a REPL are not among them.

In Unix shell makes it in 2, with the bang line:

  sort < $1 > $2
My preferred scripting language ;)

Requisite C# one-liner:

  Main(string[] args) { File.WriteAllLines(args[1], File.ReadAllLines(args[0]).OrderBy(x => x).ToArray()); }
Excludes using statements, .net library references. project file, solution file, framework config file...

I have invented a language called Croml, in which all source code compiles to the same program. This program reads in a file, sorts the lines, and writes it back out to another file. An empty source file accomplishes this task.

You are cheating because of all the menus you had to jump through just to make the project :-)

Use LinqPad then it's not cheating. It becomes:

File.WriteAllLines(args[1], File.ReadAllLines(args[0]).OrderBy(x => x));

I did actually use LINQPad. My comment was in jest :P

However, there's no Main in LINQPad which takes string[] args. Also File.WriteAllLines takes an array as its second parameter, while OrderBy returns an IEnumerable.

Don't forget the quotes...

    sort < "$1" > "$2"

It'll work on files (much) larger than RAM, too.

Absolutely yes, up to the point you're working with text. When objects (either implicit or explicit) start to interact with each other, shell scripting falls short quickly.

'Text' covers a very large domain. Once you get any data to a ordered text form, Unix command line text processing utilities have literally zero competition when it comes succinctness, power and how quickly you can create solutions.

Utility of shell goes down. Not because problems are represented as 'Text', but rather shell languages lack features like exception handling, proper error checking and many other things- Which make it difficult to write large programs in it.

As a next extension, you can learn Perl.

I assure you, after that you will not need anything ever.

> I assure you, after that you will not need anything ever.

I love it when people say things like this. As is their domain is the only domain in all programming.

For surely, you could write a competitive web browser in Perl. Or a first person shooter. Or a telecom system. Or a mars orbiter.

It makes you wonder why anyone ever invented anything else!

I read his post to mean he is talking about the 'Text' domain of things, not everything. He claims you won't want anything beyond Perl if/when you stick to handling text.

Oh. On re-reading, that makes sense. I recant my snark.

I don't actually agree with him but I just wanted to thank you for posting the recant. :)

You know if this kind of thing keeps happening on the internet I might just have to start counting on my toes.

I like perl because of how it looks similar to php, and because it looks similar to php, the structure is somewhat like C. I am not sure how good Python is at text manipulation, but I'll bet it is similar to perl and can do, if not all the things perl can, most things perl can.

Languages I know are: C, Objective-C, Perl, PHP, Javascript, and bash. Languages I played in: Python, Ruby, C++, C#, lua, go.

I want to learn C++ as it has very good cross platform stuff. But don't know when to start. I want to learn python so I can help with mailpile, but don't know when to start. I want to learn go as I want to write my own chat protocol, but don't know when to start.

I am not biased with languages, I just don't know other languages and can't give a pros/cons comparing between them.

> When objects (either implicit or explicit) start to interact with each other, shell scripting falls short quickly.

Not with PowerShell! PowerShell is amazing in how it takes the text pipe model of working and extends it to general purpose objects.

(Having access to all those .NET libraries is a nice bonus!)

you don't need hashbang if you run it the same like python:

  sh script.sh
  python script.py
So one line vs two lines

  import sys
Yep, bash is more readable for small scripts

> Yep, bash is more readable for small scripts

I don't agree with that, in my opinion bash is more coincise while python is far more explicit and thus more understandable.

Python is not meant to be written in one line; the script could be rewritten in a readable way like this:

  import sys
  inp = open(sys.argv[1], 'r')
  out = open(sys.argv[2], 'w')

This is my favorite python implementation in this thread. I daresay it's beautiful.

Including whitespace it's still half as long as the "11 line" core of the C++ script.

There's an unnecessary 'r'. Combined with the line rhythm established by opening the files in consecutive lines you ensure readers who would have been confused by the 'w' will quickly understand.

And the last line reads like programmer English; by which I mean English with a SVO word order. Not intuitive to the average person but quite parsable by anybody who's used a modern imperative language.

I can't even nitpick your choice of inp instead of input, the rhythm you setup and the contrast with 'out' means it's quite obvious what you mean.

Have an upvote.

English is usually SVO; I assume you meant OVS?

Obligatory functional style:

    import sys

    reduce(lambda i, o: o.writelines(sorted(i)),
           map(lambda args: open(*args),
               zip(sys.argv[1:], ('r', 'w'))))
If Guido didn't despised FP so much maybe we could get a nicer lambda syntax... but here's anyway.

In my opinion using reduce, map and zip is not a good idea in this case. What are they needed for? I don't even think your approach is more functional than the examples above.

I mean, this one line should be equally functional and .. it's shorter and even more understandable:

  open(sys.argv[2], 'w').writelines(sorted(open(sys.argv[1]))
(Btw you're talking about a nicer lambda syntax but imho your example looks ugly because of all the unneeded stuff you've put into it)

faster too

  $ time sort < t > t-sh
  real    0m0.016s
  user    0m0.008s
  sys     0m0.008s

  $ time py -c "import sys; open(sys.argv[1],'w').writelines(sorted(open(sys.argv[2]).readlines()))" t-py t
  real    0m0.088s
  user    0m0.056s
  sys     0m0.012s
BTW also a different result (I curled this page for the data).

It is faster mainly because of the startup time of the python interpreter. You can improve performances using the "-S" on the python interpreter. I get inconsistent results with time, I think maybe because of context switching as stated here [1].

See my results here: http://pastebin.com/HMUErzee

1: http://stackoverflow.com/questions/9006596/is-the-unix-time-...

I may be wrong but I think that this is actually spawning a subprocess and thus is not an exact translation of the python example.

This bash script rewritten in python should be something like this:

  import subprocess, sys
  inp = open(sys.argv[1], 'r')
  out = open(sys.argv[2], 'w')
  sys.exit(subprocess.call('sort', stdin=inp, stdout=out, stderr=sys.stderr))

(I don't know why but it seems that the performance of this version are worse than the original proposed example with writelines/readlines)

So what? The Unix Shell library consists of all the executables that are on the system - for me this is the beauty. There is not much of abstraction between the Unix Shell and the system under it. And shell is very tolerant to other programing languages as well - as long as they allow to represent data in text form and through streams or files ;)

Yes, I know that this is the correct way of doing it in bash. I posted this because someone might test the speed of the two scripts and conclude "bash is faster" while they actually measured the speed of "sort" probably.

Microbenchmarks are a distraction. Macrobenchmarks matter.

The real problem with bash is the mess you get when you start needing whitespace or arrays or error handling or non tabular objects or a computation not already implemented as a system program.

It doesn't have to spawn a subprocess. Some shells (like those in Busybox and Toybox) reduce common POSIX userland tools to a function call.

Right. And even it was implemented as a subprocess, it's still idiomatic shell code, i.e. what you would Actually Do.

You can even just write

  lines = sorted(open(sys.argv[1]))
since Python files act as iterators over lines.

Well, if you want a REPL for C++, there is always ROOT. Though, for the love of all that is holy in this world, I don't know why they chose to have a REPL for C++. Or why they chose the profoundly ungoogleable name of "ROOT".

If you instead search for 'root CERN' the first result is the following:


It is actually called CINT.


Or one liners:

import sys; open(sys.argv[2], 'w').writelines(sorted(open(sys.argv[2], 'w').writelines())

(use-package :com.informatimago.common-lisp.cesarum.file) (setf (string-list-text-file-contents outpath) (sort (string-list-text-file-contents inpath) (function string<)))

Indeed, what kills languages like C or C++11 in the scripting domain, is the need for declarations and other kind of boiler plate. That said, with C++11 there are means to write a library requiring less declarations, but it's still a cultural problem (beside the hard work that it would require).

Your first example is erroneous - perhaps a good example of why you shouldn't try to squeeze everything into a single line of code :)

    import sys
    src = open(sys.argv[1])
    dest = open(sys.argv[2], 'w')

It's not really boilerplate in the classical sense, he's just talking about headers and lines that have only a brace on them. I think it's a fair statement.

Plus, he could have reduced lines even further to get down to about 6, like

   vector<string> data;
   string line;
   for (ifstream ifile(argv[1]); getline(ifile, line);)
   sort(begin(data), end(data));
   copy(begin(data), end(data), ostream_iterator<string>(ofstream(argv[2]), "\n"));
And his `return 0` was superfluous so I removed it.

Unfortunately we work with iterator pairs in C++; if we had ranges like D does, we could turn those last two into

   copy(sort(data), ...)
but alas we cannot.

You could use boost::range to make it even more succinct

Once you remove the redundancy, you'll see it is pure information!

I use C++ every day but I take objection to the following claim

"powerful string manipulation functions"

boost::algorithm::split (and the rest of algorithm, frankly) is unintuitive to use. Regex requires looking up the syntax every time. No encoding/unicode support. Literally 100's of popular string libraries, all incompatible, and that's not even counting all the homebrew. char* everywhere so lots of copying to work with string objects (which is what all of the algorithms work on). Dozens of different and slightly different ways of converting other types and object to/from strings. Poor string formatting functionality, and the solutions that exist are verbose and cumbersome (strstream, boost::format, ...) .

I still love the combination of low-level power and high-level abstraction that C++ provides, but string handling is one of the most problematic areas, in my experience (which is of course colored by the type of work I do, but still).

Yup. C++ isn't a terrible language for scripting, it's mostly the fact that it lacks a single sleek standard library like C# or Python or whatnot offer for normal shell-scripting tasks.

The problem with that is that any such "sleek standard library" doesn't translate the existing (massive) body of C++ code. Using QT is already much better than the standard library though.

Completely agreed - working in areas that I do char* style strings are still commonplace, and that does make you appreciate boost or std::string a great deal more. But it does not hold a candle to the ease of string manipulation in a lot of scripting languages.

The API that c0 [0] provides is closer (though still low-level) to what is provided by more modern languages.

[0] http://c0.typesafety.net/

Strings are a known problem in C++. Wish the standard committee could end this string nonsense once and for all. Until then I'll just stick to std::string and char *.

I think this is just helping perpetuate an artificial meaningless classification of languages.

This entire "scripting language" vs. ? ("general purpose language"? "systems programming language"?) is really not well defined in the first place.

What makes some language a scripting language? Is Python a scripting language or a general purpose language? Scripting for a specific platform? "General" scripting language?

I think most people think of "scripting languages" as languages used for automating small tasks that aren't suitable (the languages, that is) for large applications. By that definition any general purpose language is automatically a "scripting" language but not the other way around.

And what does having a REPL vs. not or an interpreter vs. a compiler have to do with any of this?

> And what does having a REPL vs. not or an interpreter vs. a compiler have to do with any of this?

A lot. As a class, languages that can be executed by sending program text to stdin without polluting the execution environment are suitable for a whole class of programming techniques that languages that can't aren't.

For example, heredoc-ing to inline one language's code into another, generating code at runtime, or executing on another machine over ssh are much trickier propositions in Java or C++ than they are in Awk or Python.

REPL availability matters because it's common to use one or more REPLs as a primary UI to a machine. Lots of people use Bash or another *sh, some people use Python or a Lisp, but I've yet to hear of anyone using a C++ REPL as their shell even if such a thing may exist.

Theoretically, these are properties of the implementation and not the language, but language features tend to be so coupled to that implementation decision that it doesn't matter. (go with its 'go run' is a maybe-exception)

As you say those are implementation details.

We also can't have any meaningful discussion using your definition. What does "polluting the execution environment" mean? Are we restricting the discussion to platforms/languages where "stdin" has a meaning?

By it's nature, a scripting language's job is to "pollute it's environment", i.e. to perform some modification of the state of the system it is scripting. A scripting language is most certainly not a "filter", something that takes some input via a pipe and produces some output.

Stdin isn't important, but the ability to treat the whole environment as a process that takes source code as an input is.

Mutating the environment isn't necessary to be a useful program. The techniques of "move the code to the data, not the data to the code" and "share by communicating, don't communicate by sharing" depend on this.

The actual implementation behaviors of C and Java prevent me from treating a remote system as an abstract, environment-free machine, at least without doing a ton of tooling. The almost unavoidable, unwanted side-effects on the local filesystem due to executing source code is what I'm referring to as pollution.

We're not going to get into a functional vs. imperative discussion I hope ;-)

So if Python makes a .pyc file that's not pollution but gcc making an a.out is? How about we create a RAM disk, compile into that, and dispose of it after we're done?

A compiler is a process that takes source code as input. You simply need to draw your circle a little larger.

The reality is that the lines are blurry, definitely more blurry today then they were in 1998 (that paper that was referred to). They are blurrier because computers are faster and with more storage, compile is now more of a continuum with JIT and there are many languages that straddle multiple categories.

I've used C for "scripting", e.g. a "quick and dirty" parse some files and spit out some results and I use Python for "production" style very large applications. C++ has become more expressive and safer but I'm not sure what we get by saying it's a "scripting language".

http://bellard.org/tcc supports executing C programs "directly". By piping things through clang, for example (a number of C++ compilers can still translate to C), it can be made to execute C++ programs directly in memory, "just like" python or perl or ...

A spectacular use of this ability is in using tcc as a linux bootloader. Instead of loading vmlinuz, it loads the C source for the kernel, compiles it, and boots the result. It doesn't even need an operating system (try that with python).

> I've yet to hear of anyone using a C++ REPL as their shell even if such a thing may exist.

Actually, it seems such a thing does exist, and there are quite some people using it (like, CERN) -- see a recent link from proggit for a C REPL called "CINT" (and its top comments for a C++ REPL dubbed "Cling") at:


It isn't well defined? I always thought a scripting language was evaluated/compiled at runtime (therefore slower, often allowing dynamically generated code to be executed, no low-level memory management like pointers), versus traditional languages which are compiled (either to assembly, or some intermediate representation, but dynamically generated code in the original language is necessarily out of the picture).

Obviously the name "scripting" comes about from the fact that such langauges are intended for "scripting" automating interaction with "objects" (application scripting, HTML scripting, shell scripting, etc.), and that high-level features and on-the-fly execution are more important than performance.

I've never met anyone who thought Python wasn't a scripting language -- I mean, you run the Python interpreter on the script file. And I've never met anyone who would call a compiled language (C, Java, etc.) a scripting language. The distinction is pretty clear to me; maybe other people can think of counterexamples?

Python is compiled into Python byte code. Is Scala compiled or interpreted? (it has a REPL...)


Do we need to compile to machine languages? What about a VM? What about the Python VM is different than the Java VM or the .NET VM? Is C# therefore a scripting language or a ______ language. (fill in the blank)

ActionScript? What about JIT compilers? JavaScript?

At any rate, I think you're trying to say scripting langugage == interpreted language but there are probably interpreted languages (let's say Prolog) that you wouldn't call scripting languages and there are compiled languages that can be used for "scripting". I agree that typically scripting languages are interpreted. But I would call Python a general purpose language and not a scripting language.

> Do we need to compile to machine languages

And then, everything always eventually compiles to machine code, some just do it at runtime!

Well there is a distinction between the language and the language implementation. Being scripted or compiled is a feature of the implementation but a lot of people think that it's a feature of the language. It is kind of a pedantic point to make but there are for example CINT (http://en.wikipedia.org/wiki/CINT) which is an interpreter for C/C++ or BeanShell (http://en.wikipedia.org/wiki/BeanShell) which is an interpreter for Java.

Being executed through an interpreter, compiled to native code or something in-between (intermediary code) are several possible implementations of a given language.

In the 80s, the same language, BASIC, could be either interpreted at runtime or compiled to intermediary code. And there was even a C language interpreter.

There are at least two major C and C++ interpreters today.

It isn't well defined?

Have you seen Oberon?

> What makes some language a scripting language?

A surprisingly difficult question that I've wrestled with a lot (I did a PhD in "compilers and scripting languages" and people can be quite picky about semantics!). Here's how I think about it:


Let's attempt some sort of a definition

A scripting language is a language L with respect to an environment E where a sequence of operation that can be performed manually in E (a script) can also be performed by invoking a program in language L.

A stronger definition can involve multiple environments E for the same language.

A general purpose language L with respect to an environment E is a language in which every possible application that can run in E can be performed by invoking a program written in language L.

Then again we can strengthen this definition by including multiple platforms or environments.

Most people I know use "scripting language" in a derogatory way, "it's only a scripting language, you can't use it for real applications" and often with respect to perfectly good general purpose languages (which is why this approach isn't always constructive).

There are definitions well accepted by those that care to study programming languages in terms of Computer Science.

The problem is that many discuss what they think a certain programming language is, without the proper bases to do so.

What is the CS definition for "scripting language"?

They usually share a set of features in terms of typing, extensibility, ability to be used more as glue language than real applications, interactivity

A well known paper that discusses those capabilities is the John Ousterhout's paper for the 1998 IEEE COMPUTER, "Scripting: Higher Level Programming for the 21st Century".


Nice backpedal from "definitions".

It is a paper for a well regarded Computer Science institution, not the opinion of the guy on the corner.

Do you want some kind of ISO/ANSI standard definition?

I want you read the paper and understand how it talks about various language characteristics that a given language can embody to varying degree.

The four points on how "scripting languages differ from classical compiled languages" would mean even Java qualifies as a scripting language, and I've never seen anyone credibly claiming that.

Wikipedia is more clear on this: http://en.wikipedia.org/wiki/Scripting_language "it is uncommon to use Java as a scripting language due to the lengthy syntax and restrictive rules about which classes exist in which files" -- I'd say this applies to C++11 as well.

I think that's the real point; c++ is becoming Java.

Er... No! No GC and multi-paradigm.

I'd like to add following criteria:

* No static types

* No need to compile

* Having a REPL

F# has static types, but it feels pretty light for doing scripting. It's got a REPL and a script mode to execute things without an explicit compile phase. Perhaps "doesn't require heavy type annotations" is a better criteria than no static typing.

Agree with you, without knowing F#. I guess F# is related to Haskell and in Haskell you have static types, but you need them only where the compiler/interpreter is not able to infer them.

F# has some inheritance from OCaml and ML. Much older than Haskell. And type inference can be done in many languages. There's no reason why, for instance, Java and C# require type annotations all over the place. They could add in type inference everywhere, although it'd probably mean a overhaul of the compiler, and it wouldn't work in every case (overloading).

There's no reason why, for instance, Java and C# require type annotations all over the place.

Wouldn't their type systems be a big reason why? ML and Haskell have type systems very different from Java specifically because they went for systems that were inferable. F# has an iffy "just assume it is int" step to make it work with the C# type system. F#'s inference algorithm doesn't work as well, and it impacts how the language gets used. For example people tend to overuse the pipe operator because it helps the inference engine get the right type without annotations.

F#'s "assume it is int" is only for a few operators, such as +, as a convenience. It has nothing to do with interop with C# at all.

F# inference is left-to-right, which is one reason to use the |> operator, yes. F# had additional type inference, for instance, accessing members on a binding would infer object types, but they removed that. Haskell has a more complete type inference system.

I'm not seeing anything in C# that prohibits inference of types for fields, methods return types, or parameters. The type system C# has is essentially a subset of F#.

awk and sed don't have REPLs, and they are undoubtedly scripting languages.

I tend to say that sed could be considered as a domain-specific scripting language usable to edit text. awk is a more generic language - to my shame I used it only for parsing and translating text - even without a REPL you can 'test' quite a lot of constructs by providing the expressions to awk as an argument.

I concede that REPLs is not a must for a scripting language, but it will definitely make it more enjoyable ;)

The definition of a scripting language is that the source code (the script) is the program.

Would be great if somebody could post Python/Ruby/.. code here that achieves the same, just to compare. Even with all the C++11 additions it never felt like scripting to me, and I use it practically every day.

Small nitpcik to the author: please define variables when you're going to use them, not C-style all at the beginning of the function. It makes code easier to understand. It doesn't make the reader wonder 'hey what's this variable going to be used for' then having to crawl through all code underneath it. It creates code that's easier to refactor. Also, but more arguably for such a small sample: http://stackoverflow.com/questions/1452721/why-is-using-name...


  import System.Environment
  import Data.List

  main = do
    infile:outfile:_ <- getArgs
    input <- readFile infile
    writeFile outfile . unlines . sort . lines $ input

It's a bit odd way of matching items of a list, I'd prefer to specify it without any possible trailing items:

    [infile, outfile] <- getArgs
Also, the last two lines may be squeezed into one (though less readable):

    writeFile outfile =<< (unlines . sort . lines) `fmap` readFile infile

On your first point; I was matching the behaviour of the C++ program in the blog post, which allows any number of command line arguments but only uses the first two.

On your second point, I actually thought about writing the whole thing as

  getArgs >>= \a -> readFile (a!!0) >>= writeFile (a!!1) . unlines . sort . lines
but decided that that's exactly the kind of thing that gets Haskell programmers a bad reputation.

Agreed, it does and you made the right choice. But in reality once you know Haskell - bind is a very common operation and reading this version is very natural.

Here's a Perl example. This isn't optimally compact, but this is more or less how I would write a script like this if I had to put it into production at $work (maybe with an extra check to make sure that exactly two command line arguments were provided).

  use strict;
  use warnings;

  open my $read,"<",$ARGV[0] or die $!;
  my @lines = <$read>;
  close $read;
  open my $write,">",$ARGV[1] or die $!;
  print $write $line foreach my $line(sort @lines);
  close $write;

Perl5 v12 and up:

    use v5.12;    # enables strictures
    open my $in,  "<", $ARGV[0];
    open my $out, ">", $ARGV[1];
    print $out sort <$in>;

    sub MAIN( $in, $out ) {
        spurt( $out, open($in, chomp => False).lines.sort.join )
[EDIT: fixed typo in first example]

Here's a couple of interesting Perl alternatives:

  my @files = map { IO::File->new($_->[0], $_->[1]) || die $! } 
              ([shift, 'r'], [shift, 'w']);
  $files[1]->print( sort $files[0]->getlines );
  $_->close for @files;

  use autodie;
  open my $out, '>', $ARGV[1];
  print {$out} sort do {open my $in, '<', $ARGV[0]; <$in>};

If version >= 5.10 you can drop the 'or die' and for that matter neither of the close statements are necessary(in any recent version).


  array = IO.readlines ARGV[0]
  File.write ARGV[1], array.join("\n")
edit: get input and output file names from command line

If we want to golf it, it would probably just be

  lines = IO.readlines ARGV[0]
  File.write ARGV[1], lines.sort.join('\n')
certainly more readable than the C++ even if you're unfamiliar with ruby.

Well, if you really want to golf it, you can just write:

  File.write ARGV[1], IO.readlines(ARGV[0]).sort.join('\n')

You're fired.

    IO.write(ARGV[1], ARGF.lines.sort.join)

Here's the shortest I've gotten after looking at some of the comments:

And yes, it does work even without the space between `write` and `$*`.

Also after testing, I realized the ('\n') is not required for join. When you call 'lines', it still has the '\n' character in the string, and when you join, it defaults to join without a delimiter, so it's putting them back together with the newline still there.

Wow, this is really sweet and esoteric (didn't know about ARGF) - but, reading the docs for ARGF, wouldn't it also try to ready from ARGV[1]?

Maybe this would fix it:

  IO.write(ARGV.pop, ARGF.lines.sort.join('\n'))

Yes, good catch. join('\n') is wrong though - IO#lines preserves newlines, the default join is the empty string, and you meant "\n" :)


I believe you still need to join on newline, the default is space. However you can save a couple characters by using `$*` instead of ARGV.

Also, trim out the space between the arguments and kill the parenthesis for the optimal golfing.


> File.write ARGV[1], (IO.readlines ARGV[0]).sort.join('\n')

not valid Ruby?

  File.open(ARGV[1], 'w') { |file| file.write(File.read(ARGV[0]).lines.sort.join) }

The thing is in ruby you are already in main method so there's no need to declare main function as an entry point. The main reason that the c++ version the code is longer has some historical/performance related issues! In c++11 it could've been with smaller standard library at design but that could break the old codes! Although obviously, "Little code =! Better code". What you want to achieve is actually more important. Btw, C++ as a scripting language? at first you might think that way but truly that's a big lie :)

I'm not arguing with that ;)

BTW, I would never use Ruby for anything that needs performance, but for ease of use and readability it's really great.

Now that's the succinctness I'd expect from an actual scripting language

Except that's not what the C++ code does. The C++ reads and writes from files given as command line arguments.


Thanks. I know enough ruby to be dangerous so didn't know if more needed to be done than that simple substitution.


   ruby -e 'puts ARGF.sort' file.txt
Also, this reads stdin if the argument is omitted.

> Would be great if somebody could post Python/Ruby/.. code here that achieves the same, just to compare.

It's actually really hard to completely rewrite that program in most scripting languages: they tend not to have the same concept of undefined behaviour.

'sys.argv[2]' in python (with sys.argv = ['thing.py']) is fully defined (raises an IndexError). 'argv[2]' in C++ (with argv = (char*[2]) { "./thing", 0 }) is undefined.

If you scripting language has an FFI library (ctypes in python, say) then you could probably do something equivalent.

Defined behavior is a subset of undefined behavior. Specifically, "raises an IndexError" is a perfectly legal thing for a C++ implementation to do in this case. Approximately zero real world implementations do this, but nothing says they can't.

good point about the UB. Together with shin_lao's comment about the error/exception handling bascially sums up why the author's claim only makes some sense, but not a lot.


    import sys

    data = open(sys.argv[1]).readlines()

Not bad, however with the ''.join(data) you're effectively doubling the amount of memory needed for large files, because it will build the entire output string in memory (and you've already got the list in memory). Better to use writelines(). You can also use sorted() to iterate over and sort the input lines automatically:

   sorted_lines = sorted(open(sys.argv[1]))


  function Sort-File ($Path, $Destination) {
    Get-Content $Path | Sort-Object | Out-File $Destination
Granted, shells are probably at an advantage when it comes to file management. I'd think a bash variant would be quite short as well.

Aiming for succinctness by using aliases and using redirection instead of cmdlets for writing the file (shorter, but less flexible):

  function sf { gc $args[0] | sort > $args[1] }
That's not code anyone should write outside code golfing, though ;-)

Yeah, the bash variant is obscenely short as well. Short enough that it isn't worth having as a separate program.

    cat in.txt | sort > out.txt

Curious, in case I'm missing something. Is what you've written equivalent to the following, or is there a reason for the pipe?

sort in.txt > out.txt

For those that don't know: there is/was an ancient meme called the "Useless Use of Cat Award" about that subject. See for instance http://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat, http://partmaps.org/era/unix/award.html, or http://unix.stackexchange.com/questions/16279/should-i-care-...

From the other comment, it is completely equivalent, and I will admit that the pipe is due purely to my ignorance. I usually use `sort` for sorting the output of other commands, and so I forget that it can open a file on its own.

That usage of cat is "wrong" in general:

    cat file | command > out
is the same as

    command < file > out

Don't worry about the cat it does have a purpose: portability. cat in.txt | sort > out.txt works in powershell(cat is an alias for Get-Content and sort is an alias for sort-object)(without the sort it wouln't work).

No need for cat!

    sort in.txt > out.txt

We are talking about a script, so it should be:

cat $1 | sort > $2



Unix shell:


Another way to do it in C++ is:

  #include <iterator>

  using namespace std; // Bad but shorter!

  int main(int argc, char** argv) {
      ifstream infile(argv[1]);
      ofstream outfile(argv[2]);
      ifstream_iterator<string> start(infile), end;
      vector<string> lines(start, end);
      sort(lines.begin(), lines.end());
      copy(lines.begin(), lines.end(), ostream_iterator<char>(outfile, "\n"));

>Small nitpcik to the author

ifile, line and data have to be defined before the loop. ofile could be defined closer to usage, but I think it's much clearer where it is, grouping in and out defs together making it obvious how argv is used for each.

In Rebol:

  Rebol []

  files: map-each n system/options/args [to-file n]
  write/lines files/2 sort read/lines files/1

Python (assuming that the last line ends with a newline):

  import sys
  open(sys.argv[2], "w").writelines(sorted(open(sys.argv[1])))

every single line of code is clear, understandable and expressive

It's been a while since I looked at C++ but that statement doesn't apply to me. For example:

    for(const auto &i : data) {
      ofile << i << std::endl;
I have no idea what that's doing or why it makes any sense. If I knew C++ better maybe that wouldn't be the case but, for example, the Ruby example someone else provided is obvious to me (and I don't work with Ruby).

It feels there's a lot of telling the machine how you want to do something going on in here (instead of what you want it to do).

I'd be really interested in hearing how this sample differs from a previous C++ implementation.

I think people sometimes feel compelled to combine auto and the range based for loop when they shouldn't. In this example the range based for loop adds clarity but the use of auto doesn't. I would have written something like:

    for(const string &s : data) {
        ofile << s << std::endl;
which is a lot more clear about what type of thing you're getting out of the vector on each iteration.

Actually, the only thing this tells you about what you are getting out of the vector is that it can be implicitly converted to a string.

If data is a vector of, for example, pointers to const char, on each iteration this code will unnecessarily copy each item into a temporary string before printing it. Using auto would avoid this step, regardless of data's type.

Note that the typical non-range-based for loop also omits the vector item type... Is this that much less clear?:

  for (int i = 0; i < data.size(); ++i) {
      ofile << data[i] << std::endl;

That's a good point, but when I think of the typical code for non-range-based for loop it looks like:

    for (vector<string>::iterator i = data.begin(); i != data.end(); ++i) {
        ofile << *i << std::endl;
which makes it pretty clear what types we're dealing with.

I think in C++1y we'll be able to define a function 'foreach' that we can use like:

    foreach(data, []<class T>(const T & item) {
        ofile << item << std::endl;
This should eliminate the potential implicit conversion to temp. But I'm not sure it's seriously better than the original range based for with auto.

Traditional C++ would look something like this:

    for (vector<string>::const_iterator it = data.begin(); it != data.end(); ++it) {
      ofile << *it << endl;
But in this case we're dealing with a vector, so you don't need iterators:

    for (unsigned i = 0; i < data.size(); ++i) {
      ofile << data[i] << endl;

Ugh, define need. If sanity is required, stick to iterators.

Iterators can get very ugly. I like this when there is a need to access elem a lot inside the loop:

  for (size_t i = 0, iEnd = elems.size(); i < iEnd; ++i)
      Elem &elem = elems[i];
      // deal with elem
Of course the new C++11 syntax is king.

I think that some amount of knowledge was assumed. You would have understood it if you had known Java. For-each loops in Java have a very similar syntax.

You probably know something that is similar to Ruby, and that's why you understand it without ever learning it. I bet that you wouldn't understand it if you didn't know any programming language at all.

In some languages that 23 line program is one line. That's way more important than whether a language is a scripting language.

Maybe it's true that C++ has become less verbose and more easily flexible than it once was through some of the additions of the C++11 standard. I'd argue that a scripting language is not defined by these criteria though, a scripting language is not compiled down to a binary by definition - that's what defines it.

Semantics though.

The real differentiator is that there is no user-visible compile step. Scripting languages can be compiled to binary, through a JIT or even AOT, but when this happens, it's hidden from the user. Consider Python and Java: both are compiled to binary formats, but Python hides this while Java does not. It is common to call Python a scripting language, but it has been a long time since anyone said that of Java (and even back when they did, it was intended as an insult more than as a technical classification).

C++ is adopting some of the properties of scripting languages, but I'm not aware of any implementations that have removed the explicit compile step. I don't think it's really accurate to call C++ scripting as long as that step is still required, and I haven't seen very much interest in taking it out.

A little bit of command line foo would fix that. If you really wanted to go gung-ho you could use Linux's binfmt_misc to do it directly to .cpp files for you.

And before you complain it's not the same thing, that's basically all a scripting language does. The compilation step is hidden inside the command wrapper.

Already done: http://www.netfort.gr.jp/~dancer/software/binfmtc.html.en

Also of note: does more than just C/C++: FORTRAN, Java, Pascal, even assembly; and includes a "realcsh" and "realksh" for that C and kernel REPL you've been craving. Just "apt-get install binfmtc". I test out quick little things in C++ with this all the time.

Would it, though? C and C++ introduce a lot of complexity in the compilation stage that are hard to hide without leaking much.

You have to import the headers, using include guards if your "script" spans multiple files. Any external headers have to be in the header search path, libraries have to be explicitly linked and also be in the linker path, you have to have a makefile or similar in order to manage the compilation complexity.

If you are using, say, python, all you have to do is add the shebang, "import" the desired packages and you are good to go. One has to install the eggs, packages, or whatever the name is beforehand, but after that you can just use them.

Eh, it's all semantics; nearly any language can be compiled or interpreted, and while it is verbose, it's not insurmountable to make scriptable C++; I've got some templates I tweaked with long enough to make them pretty straightforward to cut and paste and use with binfmtc (see my other post) to quickly test things (I may have to post those templates . . . ). I also have a template for Python that does similar things, because in all honesty, while it might slow down learning a bit, at least I'm learning the correct way to do things by always having warnings cranked to the max.

There is a C++ REPL called Cling: http://root.cern.ch/drupal/content/cling

It is made by Cern and based on Clang

Agreed that C++11 is better than C++Original. But no. It's still 19 non-blank lines, all of which you have to think about and maintain. Compare this to an idiomatic version written in a "real" scripting language like Python.

   import fileinput, sys
   for line in sorted(fileinput.input()):
Advantages over C++:

* Fewer import/using lines (5 lines in the C++ version).

* No variable declarations (4 lines in the C++ version).

* Automatic iteration over file or file-like objects is very nice. No need to build a list via getline() and the terribly-named "push_back()" function.

* No "data.begin(), data.end()" parameters to the sort function -- sane defaults, people.

* "for line in file" is so much easier to read than "for(const auto &i : data)".

* Return 0 is implicit. Explicit is better than implicit, I know, but this is a very sane default. If there's an exception, Python won't return 0.

* Better automatic error handling. What does the C++ version print if a file doesn't exist or if there's a read error?

* Thanks to Python's standard library (fileinput module), it automatically handles stdin, multiple input files, etc.

How about

    #include <iostream>
    #include <iterator>
    #include <algorithm>
    #include <fstream>
    #include <string>
    #include <vector>

    using namespace std;

    struct Line
        string lineData;
        operator string() const
            return lineData;

    std::istream& operator>>(std::istream& str,Line& data)
        return str;

    int main(int argc, const char * argv[])
        ifstream ifile(argv[1]);
        ofstream ofile(argv[2]);
        vector<string> data;
        copy(istream_iterator<Line>(ifile), istream_iterator<Line>(), back_inserter(data));
        sort(data.begin(), data.end());
        copy(data.begin(),data.end(), ostream_iterator<string>(ofile, "\n"));
        return 0;
And to inform everyone. Cling as a C++ REPL exists. http://root.cern.ch/drupal/content/cling

You are including way more lines of code... I don't get your point... sorry :(

I eliminated the loops and did the processing with STL algorithms and iterators. It is ugly as sin if you try to process lines like the original code. Most scripting languages do rather well with lines but C++ does not.

Hah, I was just about to post more or less exactly the same code. :)

It's funny, had the author tweaked the problem just slightly to try make C++ look good by saying that the program should output sorted words instead of lines, you could have deleted the entire Line nonsense. Had the problem been to output sorted, unique words, you could have made the rather elegant:

    int main(int argc, const char* argv[]) 
      using input = istream_iterator<string>;
      using output = ostream_iterator<string>;
      ifstream inputfile(argv[1]);
      ofstream outputfile(argv[2]);
      set<string> words{input(inputfile), input()};
      copy(words.begin(), words.end(), output(outputfile, "\n"));
But then again, it's not exactly in C++'s favor as a scripting language that copying words is easy, while lines is hard.

Yeah, it is not a scripting language for processing lines. Although the line nonsense has a benefit though it could be adapted to parse out records and by overloading the less than operator to sort on a specific column. But then you would might use AWK instead.


The thing which (to me) makes a scripting language /better/ for actual scripting, is that it's run from sourcecode.

In unix terms, if the file starts with '#!'

Why? Because I can write a 4 line BASH script which I plonk into /usr/local/bin and it just works. When I want to check why something happened on the filesystem, I can open the script in place, and step through it in the REPL.

No faffing around with trying to figure out where the source code is, no compile/link/whatever...

Of course, while developing something a little more serious in a scripting language, I use a linter+unittests in pretty much the same way as a compiler... but that's besides the point.

Haskell is usually much more concise than C++, but I don't consider it a scripting language. (OK, I could use an interpeted haskell, I suppose... Just as you could use a BBC micro ASM interpreter in a VM... but whatever. It's just silly)

A teddy bear cut lengthwise makes a good pair of house slippers.

Must be bring your own garbage collector to work day.

Just curious, what C++11 features are used here? It's been a while since I flexed my C++ muscles, but all of this looks like the standard C++.

Admittedly, even old style C++ with Boost does look a lot like a scripting language (except for the segfaults).

  for(const auto &i : data) {
    ofile << i << std::endl;
The for loop definition there with the auto keyword. The rest has been standard for donkeys years.

Ah, nice to see that becoming mainstream. I glossed over it since it's so similar to:

    BOOST_FOREACH(string i, data) {
      ofile << i << std::endl;

Except the posted C++ code does not correctly handle memory errors or exceptions thrown by the STL.

I love C++, but let's not make stupid claims.

All the posted python/ruby/etc... above has the same problem. A OOM condition will terminate them all, which is generally the desired behavior (and thus the reason that process termination is the default handler for OOM conditions in all environments).

What error specifically are you looking to see handled that wouldn't be exactly equivalent in your scripting language of choice?

Hell, in Linux, an OOM somewhere else on the system could terminate a properly written program with error handling.

I disagree strongly with a couple of your points.

- No need to manually manage memory.

That isn't true at all. You know enough memory management to avoid memory issues in your example. C++11 didn't save you from needing to understand it -- you understand it so avoided needing it.

- Compile time with -O3 is roughly the same as Python VM startup and has to be done only once.

It still requires two separate steps. Also, compile gets slower as the program gets larger, which isn't nearly as true for Python.

Thinking it over, memory management isn't really a good argument here. Even if your program leaks, if we're talking about a job script? Who cares, the OS will clean it up when the process closes. And either way, stack allocation and references will do just fine for small jobs, you shouldn't need to be doing lots of pointer stuff for a little script.

There are many sane subsets of C++ that save us from worrying about memory. If you just keep everything on the stack or in an auto_ptr, you should do fine.

This is not true. auto_ptr (in C++11 unique_ptr) and stack discipline are not memory safe. There is no sane subset of C++ that is memory-safe. I can provide (and have provided—see my posting history) dozens of examples.

That's missing the point though. Broadly speaking, when people talk about languages with automatic storage management being "safer" they are not talking about a correctness proof of their memory handling. In fact some languages, perl among them, fail to be safe from leaks in all cases, yet no one flips out about it.

The point is practical: is the language as typically used subject to routine "accidental" memory leaks? That's surely true for C, and remains true for most C++ idioms used up until the last few years or so.

It's not true of the kind of RAII style being talked about in the linked article. In that style it's routine to write large projects that literally never call operator delete, and need to resort to an operator new only in rare circumstances (often for compatibility with older APIs).

Modern C++ when used at this level[1] really does have the same kind of casual robustness against leaks and free-memory issues that you expect to see from garbage collected environments. And it's not even hard.

[1] Which is not to say that all contemporary C++ can be written in this model. Obviously if you're doing syscall-level code you'll need to be touching memory (and probably the heap) directly. But that's sort of the point.

> The point is practical: is the language as typically used subject to routine "accidental" memory leaks? That's surely true for C, and remains true for most C++ idioms used up until the last few years or so.

If this were true, we would expect to see large C++ codebases without memory-related security vulnerabilities. But the security history of every large C++ codebase that I have seen or heard of says otherwise. I would love it to be true, but I don't think it's a tenable position that C++, even "modern" C++, is memory-safe in practice.

We can argue over whether the C++ deployed in practice is "real" modern C++, but I think that enters into no true Scotsman territory really quickly. The fact is that C++ is not memory-safe in theory and has not been shown to be memory-safe in practice. For example, I know of real security bugs in Firefox that were caused by issues that are not fixed by any "modern" C++ idioms.

> If this were true, we would expect to see large C++ codebases without memory-related security vulnerabilities.

OK, we're talking past each other. The linked article and my point was about C++'s suitability for achieving software quality in tasks that are traditionally done by "scripting" languages. Security analysis is an entirely different world, and I tend to agree that other languages have a head start there as far as memory safety.

But that said, "memory safety" is hardly a big contributor to the overall vulnerability list. C++ is much less used on web backends, and it's likewise true that almost no large web service codebase exists without non-memory-related security vulnerabilities. I don't know if there are any deployed Rust codebases of this size, but I'd expect them to have their share of whoppers too.

I agree with you that C++ is often "safe enough" for tasks that aren't security-critical: log processing or scientific computing, for example.

C++11 is so 2 years ago, C++14 for the win ;) ( https://en.wikipedia.org/wiki/C%2B%2B14 )

C++11 has many nice new things, if you are not familiar with it, I recommend starting also on the wiki: https://en.wikipedia.org/wiki/C%2B%2B11

If the C++ code I have to deal with looked like the given example, I'd be very glad. Too bad it usually looks way more complicated.

I'm struggling to think of a scripting language that requires linker and binary compatibility with supporting modules.

#Do the same thing in python

from sys import argv

open(argv[2], 'w').writelines(sorted(open(argv[1])))

That's not exactly a Pythonic example...

Except for the iteration over the vector this would be the same in C++98.

Wikipedia claims that a scripting language should be interpreted and I would argue that this means that you need a good interpreter program too. There is actually a language called Ch (http://en.wikipedia.org/wiki/Ch_(computer_programming)) that tries to achieve just that and it work pretty nicely but it's not full-blown C++.

The article mentions:

compile time with -O3 is roughly the same as Python VM startup and has to be done only once

So I think the OP argues that there is no need for a C++ interpreter because compiling C++ code is as fast as starting a Python VM.

I disagree with the author's central claim (where's the REPL?), but he makes an interesting point.

C++ has matured to the point where it has 95% of what a scripting language needs. It wouldn't be hard to write a thin wrapper that provided the final 5%, and it would come as a welcome convenience to programmer who are used to working with the C++ libraries.

Oh. Wait a moment. It's already been done. It's called Lua. Ho hum.

I wouldn't say it's gone into scripting language territory yet, but C++ is becoming a much simpler language for everyday programming. It's still a large complicated beast with sharp corners, but that only really occurs when you delve into library-writing territory. Everyday C++ code, when written in a modern style, is just cleaner, and C++14 is going to make it even better.

C++ is always the second best language for any task!

I think the OP is trying to blur the lines of "use the best tool for the job." C/C++ are powerful languages but they usually require a build infrastructure of some kind. (make, etc.)

If you need access to a native library that isn't exposed through any other tool, sure, then writing a C++ tool is an acceptable route.

I'd expect part of the definition of "scripting language" to include being interpreted. He addresses this as an extra:

  compile time with -O3 is roughly the same as Python VM startup and has to be done only once
It's a non-scripting language hassle to have to compile. Of course, you just need a little front-end to automatically compile if needed for you. IIRC Perl actually does this.

But I like his emphasis on large standard libraries, enabling compact scripts, esp string processing, and memory managed. WHy couldn't you do this trick with C (i.e. compile & run)? It's mainly libraries, though memory management isn't natural. Actually, I could believe that many scripting languages actually started like that, but shifted to their own syntax asap.

This trick can also be done with java, by keeping a server in the background to run it to cheat the VM startup tax (and auto-compiling as needed). Java verbosity is a problem, but you can write C-like code in Java. The biggest problem is the detail of Java libraries - they give you a lot of control, but a scripting language should give you less control, in return for quick functionality (like unix `sort`).

The fact that C++ is a compiled language automatically removes it from my "scripting toolbelt". Aside from that, I just don't find it to be anywhere near as expressive as shell or Python, which is hugely important when you want to understand a script you (or somebody else) wrote several years ago.

IMHO this is a terrible example as you should never write this as it is just unix' sort. Furthermore I think it is even complicated. First, I think it should read from stdin and write to stdout and second this should be really short not what c++ people consider short.


import Data.List

main = interact (unlines . sort . lines)

I haven't written much C++ in quite some time. Does it still suffer from slow compilation? There's a module system in the works that should help a lot.


Using template heavy code can cause really slow compilation. If get really liberal with nice things from Boost, a simple looking file can take a couple minutes.

On the other hand, by modularizing the code down into libraries, and generally using incremental compilation, after an initial 'full build', minor builds during development are not too slow.

An example of my problem with C++ is, writing a function in a shared library, which is meant to return a class from the standard library say vector<string>, to the program that calls it is very unwise.

Can you imagine if your Python modules couldn't return objects from the standard library?

This is because the shared library and calling program might have been compiled against a different version of the standard library, and also because the 'flattened names', used to refer to members of a class are not uniform between compilers / compiler versions. You can often get away with stuff on linux, because all the software is compiled in the same environment, but build once run anywhere? No.

This is turning into a bit of a rant. I like C++, but it has so many imperfect, jagged edges, enough to surprise programmers after 10 or 20 years. There is still a lot left to fix.

It's proportional to the size and complexity of the code base really. Precompiled headers help.

C++ is really great, but there are two major pains that I wish someone found remedies for: compile time and compiler error messages.

Lack of tooling was another one, but with libclang tons of good tools are coming.

I adhere to the traditional definition of a scripting language as being a language used for automation; i.e., to control a host application or several host applications.

why would you ignore boilerplate? they count as well! that's why anyone would skip C++ and use python. Because they don't want to write boilerplate code!

I'd be more inclined to agree if there was an example showing how to type script.cpp in the terminal and have it automatically build with g++ and run.

Quite an odd definition/view of what a scripting language is, but yes I like C++11 and C++14 sounds even more promising!

It's my favorite language. Python ranks a close second. With those two languages, I can do anything.

"faster than any non-JITted scripting language" <- why not JITted actually?

that actually sounds precisely like D's sweet spot - a C++-like language, but with decreased boilerplate and some high-level features that you'd expect from a "scripting language".

What does the compile time look like?

The biggest thing about a real scripting language is that I have one step to go from edit to run.

